- 
                Notifications
    You must be signed in to change notification settings 
- Fork 28.9k
[SPARK-18188] Add checksum for shuffle blocks #15894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Test build #68680 has finished for PR 15894 at commit  
 | 
|  | ||
| @Override | ||
| public InputStream createInputStream() throws IOException { | ||
| public InputStream createInputStream(boolean checksum) throws IOException { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, is this only for testing? because it's otherwise very expensive to compute, requiring two passes over the file.
If so, should it not be in some more package-private method only and not exposed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question.
Actually we already have checksum along with compression, we could move the decompression a little bit earlier to detect the corruption in block fetcher, will try that soon.
| import java.io.*; | ||
| import java.nio.ByteBuffer; | ||
| import java.nio.channels.FileChannel; | ||
| import java.util.zip.Adler32; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be better to abstract out the checksum functions into some class so it's easier to change in the future.
| override def onBlockFetchSuccess(blockId: String, buf: ManagedBuffer): Unit = { | ||
| // Only add the buffer to results queue if the iterator is not zombie, | ||
| // i.e. cleanup() has not been called yet. | ||
| val in = try { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: move this out of callback to not block network pool
| Due to complexity and overhead here, close it in favor of #15923. | 
What changes were proposed in this pull request?
TBD
How was this patch tested?
Existing tests.